COVSMA stands for Copernicus Satellites Versus Maladies: The current sanitary crisis generates the necessity to develop an online tool to monitor pollution levels, display alerts that will imply that governments automatically take measures: days without cars and trucks that are not 100% Electrical, monitor the live impact of the measures taken, forecast COVID19 risk due to pm2.5 exposure for up to 4 days in the future, and predict new hospitalisations due to severe COVID19 cases for all states/departements. We have named this tool with the analog name of COVSCO (Copernicus Satellites Versus COVID19). We start with France and its 96 departements. A follow up will be to apply the same methodology to severe respiratory diseases and to expand the model and databases to a global scale.
nom numero time hospi reanim newhospi newreanim \
35 Ain 1.0 2020-05-14 137.0 8.0 4.0 0.0
36 Ain 1.0 2020-05-15 135.0 7.0 4.0 0.0
37 Ain 1.0 2020-05-16 134.0 6.0 1.0 0.0
38 Ain 1.0 2020-05-17 133.0 6.0 1.0 0.0
39 Ain 1.0 2020-05-18 132.0 6.0 1.0 0.0
... ... ... ... ... ... ... ...
35899 Haute-Corse 202.0 2021-04-13 28.0 7.0 4.0 1.0
35900 Haute-Corse 202.0 2021-04-14 28.0 7.0 0.0 2.0
35901 Haute-Corse 202.0 2021-04-15 26.0 7.0 0.0 0.0
35902 Haute-Corse 202.0 2021-04-16 25.0 6.0 0.0 0.0
35903 Haute-Corse 202.0 2021-04-17 23.0 5.0 0.0 0.0
deces gueris dep_num ... normo37davg normpm107davg normco7davg \
35 88.0 318.0 1.0 ... 0.586532 0.129071 0.234867
36 89.0 323.0 1.0 ... 0.577448 0.127923 0.236187
37 90.0 325.0 1.0 ... 0.570966 0.125429 0.236587
38 90.0 326.0 1.0 ... 0.567594 0.123235 0.236110
39 90.0 331.0 1.0 ... 0.563293 0.120713 0.237371
... ... ... ... ... ... ... ...
35899 84.0 403.0 202.0 ... 0.695757 0.127474 0.207173
35900 84.0 403.0 202.0 ... 0.695959 0.125188 0.209451
35901 84.0 405.0 202.0 ... 0.699680 0.122655 0.211129
35902 84.0 406.0 202.0 ... 0.700659 0.121988 0.212173
35903 84.0 408.0 202.0 ... 0.702050 0.121749 0.212114
normpm251Mavg normno21Mavg normo31Mavg normpm101Mavg normco1Mavg \
35 0.108579 0.048632 0.559299 0.094840 0.218359
36 0.119927 0.051150 0.496692 0.102941 0.226082
37 0.129652 0.051861 0.455272 0.105411 0.234060
38 0.127766 0.051226 0.458184 0.101608 0.239713
39 0.142927 0.049201 0.477197 0.113024 0.255846
... ... ... ... ... ...
35899 0.068316 0.023666 0.669700 0.090343 0.183506
35900 0.072675 0.023549 0.673435 0.086638 0.189423
35901 0.075989 0.023410 0.676995 0.080507 0.197941
35902 0.082391 0.023809 0.669926 0.079769 0.206397
35903 0.090115 0.025084 0.679878 0.079151 0.213158
pm25level pm25levelstring
35 0 Low
36 0 Low
37 0 Low
38 0 Low
39 0 Low
... ... ...
35899 0 Low
35900 0 Low
35901 0 Low
35902 0 Low
35903 0 Low
[32544 rows x 94 columns]
<class 'pandas.core.frame.DataFrame'> RangeIndex: 35904 entries, 0 to 35903 Data columns (total 94 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 nom 35904 non-null object 1 numero 35904 non-null float64 2 time 35904 non-null datetime64[ns] 3 hospi 35904 non-null float64 4 reanim 35904 non-null float64 5 newhospi 35904 non-null float64 6 newreanim 35904 non-null float64 7 deces 35904 non-null float64 8 gueris 35904 non-null float64 9 dep_num 35904 non-null float64 10 lon 35904 non-null float64 11 lat 35904 non-null float64 12 name 35904 non-null object 13 captial 35904 non-null object 14 area 35904 non-null float64 15 total 35904 non-null float64 16 density 35904 non-null float64 17 idx 35904 non-null float64 18 pm25 35904 non-null float64 19 no2 35904 non-null float64 20 o3 35904 non-null float64 21 co 35904 non-null float64 22 pm10 35904 non-null float64 23 longitude 35904 non-null float64 24 latitude 35904 non-null float64 25 Region_x 35904 non-null object 26 Departement_x 35904 non-null object 27 depnum_x 35904 non-null float64 28 ds 35904 non-null object 29 country 35904 non-null object 30 polygon_source 35904 non-null object 31 polygon_id 35904 non-null object 32 polygon_name 35904 non-null object 33 all_day_bing_tiles_visited_relative_change 35904 non-null float64 34 all_day_ratio_single_tile_users 35904 non-null float64 35 baseline_name 35904 non-null object 36 baseline_type 35904 non-null object 37 CovidPosTest 32640 non-null float64 38 totalcovidcasescumulated 32640 non-null float64 39 covidpostestprevday 32544 non-null float64 40 prevdaytotalcovidcasescumulated 35904 non-null object 41 vac1nb 35904 non-null float64 42 vac2nb 35904 non-null float64 43 Code département 35904 non-null int64 44 Insuffisance respiratoire chronique grave (ALD14) 35904 non-null int64 45 Insuffisance cardiaque grave, troubles du rythme graves, cardiopathies valvulaires graves, cardiopathies congénitales graves (ALD5) 35904 non-null int64 46 Region_y 35904 non-null object 47 Departement_y 35904 non-null object 48 depnum_y 35904 non-null int64 49 Smokers 35904 non-null float64 50 Nb_susp_501Y_V1 35904 non-null int64 51 Nb_susp_501Y_V2_3 35904 non-null int64 52 minority 35904 non-null float64 53 pauvrete 35904 non-null float64 54 rsa 35904 non-null float64 55 ouvriers 35904 non-null float64 56 normpm25 35904 non-null float64 57 normno2 35904 non-null float64 58 normo3 35904 non-null float64 59 normpm10 35904 non-null float64 60 normco 35904 non-null float64 61 1MMaxpm25 35904 non-null float64 62 1MMaxno2 35904 non-null float64 63 1MMaxo3 35904 non-null float64 64 1MMaxpm10 35904 non-null float64 65 1MMaxco 35904 non-null float64 66 1MMaxnormpm25 35904 non-null float64 67 1MMaxnormno2 35904 non-null float64 68 1MMaxnormo3 35904 non-null float64 69 1MMaxnormpm10 35904 non-null float64 70 1MMaxnormco 35904 non-null float64 71 hospiprevday 35808 non-null float64 72 pm257davg 35904 non-null float64 73 no27davg 35904 non-null float64 74 o37davg 35904 non-null float64 75 pm107davg 35904 non-null float64 76 co7davg 35904 non-null float64 77 pm251Mavg 35904 non-null float64 78 no21Mavg 35904 non-null float64 79 o31Mavg 35904 non-null float64 80 pm101Mavg 35904 non-null float64 81 co1Mavg 35904 non-null float64 82 normpm257davg 35904 non-null float64 83 normno27davg 35904 non-null float64 84 normo37davg 35904 non-null float64 85 normpm107davg 35904 non-null float64 86 normco7davg 35904 non-null float64 87 normpm251Mavg 35904 non-null float64 88 normno21Mavg 35904 non-null float64 89 normo31Mavg 35904 non-null float64 90 normpm101Mavg 35904 non-null float64 91 normco1Mavg 35904 non-null float64 92 pm25level 35904 non-null int64 93 pm25levelstring 35904 non-null object dtypes: datetime64[ns](1), float64(70), int64(7), object(16) memory usage: 25.7+ MB
| idx | pm25 | no2 | o3 | pm10 | co | pm257davg | no27davg | o37davg | co7davg | ... | normno27davg | normo37davg | normpm107davg | normco7davg | normpm251Mavg | normno21Mavg | normo31Mavg | normpm101Mavg | normco1Mavg | newhospi | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | 631877.0 | 6.403166 | 3.939003 | 43.262828 | 7.967417 | 177.622754 | 7.257504 | 3.476745 | 72.813923 | 161.817954 | ... | 0.046602 | 0.586532 | 0.129071 | 0.234867 | 0.108579 | 0.048632 | 0.559299 | 0.094840 | 0.218359 | 4.0 |
| 35 | 631877.0 | 10.041256 | 4.039703 | 45.365958 | 12.985050 | 177.204810 | 7.283296 | 3.523846 | 71.713745 | 162.361518 | ... | 0.047305 | 0.577448 | 0.127923 | 0.236187 | 0.119927 | 0.051150 | 0.496692 | 0.102941 | 0.226082 | 4.0 |
| 36 | 631877.0 | 8.650893 | 2.993409 | 64.447998 | 10.213892 | 173.986833 | 7.222348 | 3.502823 | 70.928693 | 162.525890 | ... | 0.046991 | 0.570966 | 0.125429 | 0.236587 | 0.129652 | 0.051861 | 0.455272 | 0.105411 | 0.234060 | 1.0 |
| 37 | 631877.0 | 7.924968 | 2.470320 | 81.736362 | 11.228378 | 163.052671 | 7.159819 | 3.487527 | 70.520307 | 162.329540 | ... | 0.046763 | 0.567594 | 0.123235 | 0.236110 | 0.127766 | 0.051226 | 0.458184 | 0.101608 | 0.239713 | 1.0 |
| 38 | 631877.0 | 8.803713 | 2.883282 | 79.918855 | 11.338718 | 186.330957 | 7.143561 | 3.477693 | 69.999498 | 162.848931 | ... | 0.046616 | 0.563293 | 0.120713 | 0.237371 | 0.142927 | 0.049201 | 0.477197 | 0.113024 | 0.255846 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 34171 | 1215390.0 | 6.100140 | 6.627001 | 70.229467 | 11.613934 | 185.482428 | 14.580851 | 14.493739 | 56.915039 | 207.123258 | ... | 0.211142 | 0.455257 | 0.281847 | 0.344925 | 0.302061 | 0.237179 | 0.471712 | 0.271828 | 0.352307 | 43.0 |
| 34172 | 1215390.0 | 8.425896 | 9.099569 | 68.209629 | 13.743489 | 193.139682 | 14.195306 | 14.223872 | 57.801110 | 206.960739 | ... | 0.207112 | 0.462573 | 0.269762 | 0.344530 | 0.289930 | 0.240049 | 0.480526 | 0.265162 | 0.358714 | 38.0 |
| 34173 | 1215390.0 | 13.033086 | 29.526526 | 54.441418 | 18.517216 | 237.388093 | 14.249303 | 14.775230 | 57.704602 | 207.544318 | ... | 0.215346 | 0.461776 | 0.270202 | 0.345948 | 0.295797 | 0.280570 | 0.463981 | 0.266083 | 0.371198 | 44.0 |
| 34174 | 1215390.0 | 21.792762 | 51.437356 | 40.131381 | 30.011262 | 307.382279 | 14.525749 | 16.057324 | 57.291931 | 210.476992 | ... | 0.234495 | 0.458369 | 0.274637 | 0.353072 | 0.286940 | 0.319473 | 0.461757 | 0.261254 | 0.379313 | 88.0 |
| 34175 | 1215390.0 | 21.862512 | 49.729555 | 42.796275 | 32.744934 | 290.497534 | 14.654314 | 17.360417 | 57.019812 | 213.319222 | ... | 0.253957 | 0.456122 | 0.278918 | 0.359977 | 0.278900 | 0.366677 | 0.452311 | 0.260245 | 0.393208 | 83.0 |
30912 rows × 56 columns
Index(['idx', 'pm25', 'pm257davg', 'normpm25', 'hospiprevday',
'covidpostestprevday', 'prevdaytotalcovidcasescumulated',
'all_day_bing_tiles_visited_relative_change',
'all_day_ratio_single_tile_users', 'vac1nb', 'vac2nb',
'Insuffisance respiratoire chronique grave (ALD14)',
'Insuffisance cardiaque grave, troubles du rythme graves, cardiopathies valvulaires graves, cardiopathies congénitales graves (ALD5)',
'Smokers', 'minority', 'Nb_susp_501Y_V1', 'Nb_susp_501Y_V2_3',
'1MMaxpm25', 'pm251Mavg', 'pauvrete', 'rsa', 'ouvriers'],
dtype='object')
22
The daily number of new hospitalizations due to severe COVID19 cases for every French departement is what we will predict.
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cd08ddc0>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cc6d08b0>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1ccdeb400>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cc13b880>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cc5ee1c0>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cc9dc3a0>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1d6c423d0>
<AxesSubplot:>
<AxesSubplot:>
[95.]
Département Numéro Date of pollution peak 1MMaxo3 \
0 Val-d'Oise 95.0 2020-08-09 122.889769
1 Alpes-de-Haute-Provence 4.0 2020-08-08 119.833327
2 Haut-Rhin 68.0 2020-07-31 116.979563
3 Moselle 57.0 2020-08-10 115.873423
4 Ain 1.0 2020-09-18 114.506434
.. ... ... ... ...
90 Pyrénées-Atlantiques 64.0 2020-08-06 94.093549
91 Gard 30.0 2020-05-20 93.974202
92 Cher 18.0 2020-07-13 93.921768
93 Hautes-Pyrénées 65.0 2020-05-30 93.086194
94 Ariège 9.0 2020-05-29 90.297432
totalcovidcasescumulated Population Index
0 237793 1215390.0
1 20858 161799.0
2 79025 762607.0
3 141705 1044486.0
4 109223 631877.0
.. ... ...
90 62382 670032.0
91 107351 738189.0
92 32904 308992.0
93 25307 228582.0
94 14118 152499.0
[95 rows x 6 columns]
[59.]
Département Numéro Date of pollution peak 1MMaxpm25 \
0 Nord 59.0 2020-11-27 39.932960
1 Haut-Rhin 68.0 2021-02-24 37.243984
2 Deux-Sèvres 79.0 2021-03-09 36.380767
3 Paris 75.0 2021-01-02 35.418335
4 Vienne 86.0 2021-03-09 34.895373
.. ... ... ... ...
90 Alpes-Maritimes 6.0 2021-03-06 20.296409
91 Côte-d'Or 21.0 2021-02-23 20.115372
92 Lozère 48.0 2021-03-04 19.935369
93 Alpes-de-Haute-Provence 4.0 2021-02-23 19.765781
94 Ardèche 7.0 2021-03-04 19.407017
totalcovidcasescumulated Population Index
0 477936 2605238.0
1 79025 762607.0
2 35767 374435.0
3 404122 2206488.0
4 38030 434887.0
.. ... ...
90 219293 1082440.0
91 66868 533147.0
92 10367 76309.0
93 20858 161799.0
94 45658 324209.0
[95 rows x 6 columns]
[75.]
Département Numéro Date of pollution peak 1MMaxno2 \
0 Paris 75.0 2021-03-31 67.312539
1 Hauts-de-Seine 92.0 2021-03-02 64.475306
2 Val-de-Marne 94.0 2021-01-08 53.384538
3 Val-d'Oise 95.0 2021-03-02 51.599818
4 Yvelines 78.0 2020-11-26 43.024886
.. ... ... ... ...
90 Lozère 48.0 2021-01-07 7.727260
91 Corse-du-Sud 201.0 2020-12-09 6.285247
92 Ariège 9.0 2021-01-09 6.254226
93 Pyrénées-Orientales 66.0 2021-01-09 6.071631
94 Haute-Corse 202.0 2021-01-09 5.592579
totalcovidcasescumulated Population Index
0 404122 2206488.0
1 266670 1601569.0
2 265897 1372389.0
3 237793 1215390.0
4 212766 1427291.0
.. ... ...
90 10367 76309.0
91 11183 152730.0
92 14118 152499.0
93 45028 471038.0
94 13884 174553.0
[95 rows x 6 columns]
[92.]
Département Numéro Date of pollution peak 1MMaxco \
0 Hauts-de-Seine 92.0 2020-11-26 476.783872
1 Bas-Rhin 67.0 2020-11-10 442.772829
2 Paris 75.0 2020-11-26 442.031472
3 Val-de-Marne 94.0 2021-01-02 400.354289
4 Bouches-du-Rhône 13.0 2021-02-24 364.207868
.. ... ... ... ...
90 Cantal 15.0 2021-01-06 204.699850
91 Lozère 48.0 2021-01-07 202.482931
92 Hautes-Pyrénées 65.0 2021-01-10 200.001928
93 Ariège 9.0 2021-01-11 189.451920
94 Pyrénées-Orientales 66.0 2021-03-06 182.390438
totalcovidcasescumulated Population Index
0 266670 1601569.0
1 138675 1116658.0
2 404122 2206488.0
3 265897 1372389.0
4 404460 2016622.0
.. ... ...
90 11762 146219.0
91 10367 76309.0
92 25307 228582.0
93 14118 152499.0
94 45028 471038.0
[95 rows x 6 columns]
[67.]
Département Numéro Date of pollution peak 1MMaxpm10 \
0 Bas-Rhin 67.0 2021-02-25 74.188288
1 Haut-Rhin 68.0 2021-02-25 71.831104
2 Corse-du-Sud 201.0 2021-02-06 70.996064
3 Vosges 88.0 2021-02-25 70.504318
4 Haute-Saône 70.0 2021-02-25 69.817902
.. ... ... ... ...
90 Mayenne 53.0 2021-03-03 40.262764
91 Eure 27.0 2021-03-03 39.774691
92 Calvados 14.0 2021-03-02 38.762629
93 Sarthe 72.0 2021-03-03 36.942586
94 Orne 61.0 2021-03-02 36.142811
totalcovidcasescumulated Population Index
0 138675 1116658.0
1 79025 762607.0
2 11183 152730.0
3 41055 372016.0
4 28170 237706.0
.. ... ...
90 29192 307940.0
91 65397 601948.0
92 61873 693579.0
93 57288 568445.0
94 29094 286618.0
[95 rows x 6 columns]
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1cd2e5f10>
<AxesSubplot:>
Gradient Boosting for regression.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.
Stack of estimators with a final regressor.
Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.
Note that estimators_ are fitted on the full X while finalestimator is trained using cross-validated predictions of the base estimators using cross_val_predict.
['idx', 'pm25', 'pm257davg', 'pm251Mavg', '1MMaxpm25', 'pm10', 'pm107davg', 'pm101Mavg', '1MMaxpm10', 'no2', 'no27davg', 'no21Mavg', '1MMaxno2', 'o3', 'o37davg', 'o31Mavg', '1MMaxo3', 'co', 'co7davg', 'co1Mavg', '1MMaxco', 'hospiprevday', 'covidpostestprevday', 'prevdaytotalcovidcasescumulated', 'all_day_bing_tiles_visited_relative_change', 'all_day_ratio_single_tile_users', 'vac1nb', 'vac2nb', 'Insuffisance respiratoire chronique grave (ALD14)', 'Insuffisance cardiaque grave, troubles du rythme graves, cardiopathies valvulaires graves, cardiopathies congénitales graves (ALD5)', 'Smokers', 'minority', 'Nb_susp_501Y_V1', 'Nb_susp_501Y_V2_3', 'pauvrete', 'rsa', 'ouvriers'] T-Pot exported current best pipeline MSE: 45.27867098480078 MAE: 3.638434493897027 (32544, 1) (32544, 94)
{'fit_time': array([42.10823298, 50.39555478, 51.73287916, 47.11963868, 55.79833794]), 'score_time': array([0.03028107, 0.03814673, 0.030725 , 0.03765416, 0.03276253]), 'test_neg_mean_squared_error': array([ -45.94187809, -38.60143067, -37.20529146, -104.75231502,
-70.21355169]), 'train_neg_mean_squared_error': array([-31.90270441, -31.29582623, -32.28401689, -25.47495297,
-26.1918782 ]), 'test_neg_mean_absolute_error': array([-3.33832828, -3.49598736, -3.35138463, -5.72416838, -4.5080333 ]), 'train_neg_mean_absolute_error': array([-2.85603936, -2.78007136, -2.82318506, -2.48488596, -2.59955426])}
MSE:
-59.34289338587435
MAE
-4.0835803872517165
Scikit Learn - GradientBoostingRegressor:
index feature_importance
33 Nb_susp_501Y_V2_3 0.001061
26 vac1nb 0.001590
30 Smokers 0.001639
34 pauvrete 0.001982
27 vac2nb 0.002108
36 ouvriers 0.002132
20 1MMaxco 0.002190
35 rsa 0.002255
4 1MMaxpm25 0.002278
28 Insuffisance respiratoire chronique grave (ALD14) 0.002470
12 1MMaxno2 0.002489
32 Nb_susp_501Y_V1 0.002535
11 no21Mavg 0.002574
7 pm101Mavg 0.002628
3 pm251Mavg 0.002914
5 pm10 0.002962
9 no2 0.002973
1 pm25 0.003014
6 pm107davg 0.003068
31 minority 0.003239
10 no27davg 0.003298
19 co1Mavg 0.003371
0 idx 0.003443
8 1MMaxpm10 0.003484
15 o31Mavg 0.003508
29 Insuffisance cardiaque grave, troubles du ryth... 0.003550
13 o3 0.003573
17 co 0.003704
2 pm257davg 0.003823
14 o37davg 0.005323
18 co7davg 0.005830
16 1MMaxo3 0.005888
24 all_day_bing_tiles_visited_relative_change 0.008443
25 all_day_ratio_single_tile_users 0.034079
23 prevdaytotalcovidcasescumulated 0.122726
22 covidpostestprevday 0.211698
21 hospiprevday 0.526160
<Figure size 900x600 with 0 Axes>
TPOTRegressor
Version 0.11.6.post1 of tpot is outdated. Version 0.11.7 was released Wednesday January 06, 2021.
TPOT closed during evaluation in one generation. WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation. TPOT closed prematurely. Will use the current best pipeline. Best pipeline: ExtraTreesRegressor(CombineDFs(input_matrix, input_matrix), bootstrap=False, max_features=0.5, min_samples_leaf=1, min_samples_split=20, n_estimators=100) -48.56940976377202
The elbow method determines that the optimal number of clusters for PM2.5 Levels is k = 4
0.6375679675824064
39.9329597660099
[0.6375679675824064, 10.46141591718928, 20.285263866796154, 30.10911181640303, 39.9329597660099]
nom pm25levelstring
4983 Nord High
5695 Pas-de-Calais High
6051 Somme High
31683 Paris High
33107 Hauts-de-Seine High
34175 Val-d'Oise High
OK
<seaborn.axisgrid.PairGrid at 0x7fb105e6f2b0>
Part of our product is dedicated to predicting the daily new derpartmental number of hospitalizations due to severe cases of COVID19, these predictions are made by a state of the art Machine Learning model, fine tuned by an Auto Machine Learning optimizer, and are necessary for hospitals and clinics to potentially organize emergency outflows of patients to other locations in case of over-crowding.
The virus' contagious characteristic and facebook's mobility index lead our model's feature importance report but data visualizationg makes it clear to the eye that the mean of all french departments' new hospitalizations due to severe COVID-19 cases is an increasing function of PM2.5-1-M Maximum and PM10 7-day average differentials. Furthermore, unusually high levels in Ozone at ground level seem to act as a trigger to the epidemy.
Our model makes predictions based on live data flows, composed of a set of 22 features, some such as ground level atmospheric pollutant concentrations, others reflecting prevalence/incidence of variants, others the impact of the vaccination campaign and soon temperature and humidity data.
Our algorithm is first trained on a maximum depth historical database obtained by merging features streamed from a multitude of data sources/providers in our database. The training of the model will frequently be launched with the goal that the algorithm continues learning from new data. An API will load the latest model and feed directly all the necessary outputs, including the integration of GIS elements, to the company's product website.
Our tool also ranks French departments by their pollution levels and gives alerts when PM2.5, PM10 or/and other pollutants levels are abnormally high, determining an optimal number of levels is with the K-Means clustering elbow method. These alerts are translated into recommendations with the goal that the government automatically takes measures to stop heavy traffic pollution coming from non electrical cars and trucks until the monitoring team determines that the live levels in pollutants have lowered to a safe cluster interval.
Data pre-processing:
Population ... OK
Covid ... numero hospi reanim newhospi newreanim deces \
0 1.0 190.142857 22.000000 12.428571 3.142857 575.000000
1 2.0 319.571429 57.000000 22.714286 4.428571 952.571429
2 3.0 93.285714 18.857143 8.857143 1.142857 526.142857
3 4.0 150.000000 7.142857 6.285714 0.428571 230.000000
4 5.0 118.857143 12.714286 4.571429 0.571429 238.285714
.. ... ... ... ... ... ...
96 971.0 110.857143 21.571429 11.571429 2.000000 217.714286
97 972.0 111.428571 27.142857 11.285714 3.285714 66.571429
98 973.0 39.857143 11.714286 2.571429 0.571429 91.571429
99 974.0 143.714286 38.428571 9.000000 3.142857 148.142857
100 976.0 13.428571 6.285714 0.857143 0.285714 124.000000
gueris
0 2477.000000
1 3512.571429
2 1835.571429
3 966.571429
4 1052.714286
.. ...
96 989.428571
97 562.000000
98 2303.142857
99 1329.285714
100 1228.000000
[101 rows x 7 columns]
reg dep com article com_nom lon lat \
0 11 75 56 NaN PARIS 2.352222 48.856614
1 11 77 1 NaN ACHERES-LA-FORET 2.570289 48.354976
2 11 77 10 NaN AUBEPIERRE-OZOUER-LE-REPOS 2.890552 48.632323
3 11 77 100 LE CHATELET-EN-BRIE 2.792095 48.504945
4 11 77 101 NaN CHATENAY-SUR-SEINE 3.096229 48.418774
... ... ... ... ... ... ... ...
36313 94 202 9 NaN ALERIA 9.512429 42.104248
36314 94 202 93 NaN CORBARA 8.907482 42.615508
36315 94 202 95 NaN CORSCIA 9.042592 42.354646
36316 94 202 96 NaN CORTE 9.149022 42.309409
36317 94 202 97 NaN COSTA 9.001945 42.574916
total idx hospi
0 2.240213e+06 1 1743.142857142857
1 1.285127e+03 1 701.5714285714286
2 8.993365e+02 1 701.5714285714286
3 4.454928e+03 1 701.5714285714286
4 8.795762e+02 1 701.5714285714286
... ... ... ...
36313 2.005203e+03 0.128571 25.714285714285715
36314 1.002777e+03 0.128571 25.714285714285715
36315 1.833197e+02 0.128571 25.714285714285715
36316 6.756341e+03 0.128571 25.714285714285715
36317 6.962550e+01 0.128571 25.714285714285715
[36318 rows x 10 columns]
OK
PM2.5 ... OK
<xarray.DataArray 'pm2p5_conc' (com: 36318)>
array([16.15899158, 16.05053063, 15.57088451, ..., 5.90282703,
6.10779241, 6.66385892])
Coordinates:
time <U10 '2021-04-17'
longitude (com) float64 2.352 2.57 2.891 2.792 ... 8.907 9.043 9.149 9.002
latitude (com) float64 48.86 48.35 48.63 48.5 ... 42.62 42.35 42.31 42.57
* com (com) int64 0 1 2 3 4 5 6 ... 36312 36313 36314 36315 36316 36317
17.648726227446026
<ipython-input-103-89f28095f29e>:216: DeprecationWarning: The background_patch property is deprecated. Use GeoAxes.patch instead. ax1.background_patch.set_fill(False) <ipython-input-103-89f28095f29e>:222: DeprecationWarning: The outline_patch property is deprecated. Use GeoAxes.spines['geo'] or the default Axes properties instead. a.outline_patch.set_linewidth(0.)
Data pre-processing:
Population ... OK
Covid ... numero hospi reanim newhospi newreanim deces \
0 1.0 190.142857 22.000000 12.428571 3.142857 575.000000
1 2.0 319.571429 57.000000 22.714286 4.428571 952.571429
2 3.0 93.285714 18.857143 8.857143 1.142857 526.142857
3 4.0 150.000000 7.142857 6.285714 0.428571 230.000000
4 5.0 118.857143 12.714286 4.571429 0.571429 238.285714
.. ... ... ... ... ... ...
96 971.0 110.857143 21.571429 11.571429 2.000000 217.714286
97 972.0 111.428571 27.142857 11.285714 3.285714 66.571429
98 973.0 39.857143 11.714286 2.571429 0.571429 91.571429
99 974.0 143.714286 38.428571 9.000000 3.142857 148.142857
100 976.0 13.428571 6.285714 0.857143 0.285714 124.000000
gueris
0 2477.000000
1 3512.571429
2 1835.571429
3 966.571429
4 1052.714286
.. ...
96 989.428571
97 562.000000
98 2303.142857
99 1329.285714
100 1228.000000
[101 rows x 7 columns]
reg dep com article com_nom lon lat \
0 11 75 56 NaN PARIS 2.352222 48.856614
1 11 77 1 NaN ACHERES-LA-FORET 2.570289 48.354976
2 11 77 10 NaN AUBEPIERRE-OZOUER-LE-REPOS 2.890552 48.632323
3 11 77 100 LE CHATELET-EN-BRIE 2.792095 48.504945
4 11 77 101 NaN CHATENAY-SUR-SEINE 3.096229 48.418774
... ... ... ... ... ... ... ...
36313 94 202 9 NaN ALERIA 9.512429 42.104248
36314 94 202 93 NaN CORBARA 8.907482 42.615508
36315 94 202 95 NaN CORSCIA 9.042592 42.354646
36316 94 202 96 NaN CORTE 9.149022 42.309409
36317 94 202 97 NaN COSTA 9.001945 42.574916
total idx hospi
0 2.240213e+06 1 1743.142857142857
1 1.285127e+03 0.393529 701.5714285714286
2 8.993365e+02 0.393529 701.5714285714286
3 4.454928e+03 0.393529 701.5714285714286
4 8.795762e+02 0.393529 701.5714285714286
... ... ... ...
36313 2.005203e+03 0 25.714285714285715
36314 1.002777e+03 0 25.714285714285715
36315 1.833197e+02 0 25.714285714285715
36316 6.756341e+03 0 25.714285714285715
36317 6.962550e+01 0 25.714285714285715
[36318 rows x 10 columns]
OK
PM2.5 ... OK
<xarray.DataArray 'pm2p5_conc' (com: 36318)>
array([16.15899158, 16.05053063, 15.57088451, ..., 5.90282703,
6.10779241, 6.66385892])
Coordinates:
time <U10 '2021-04-17'
longitude (com) float64 2.352 2.57 2.891 2.792 ... 8.907 9.043 9.149 9.002
latitude (com) float64 48.86 48.35 48.63 48.5 ... 42.62 42.35 42.31 42.57
* com (com) int64 0 1 2 3 4 5 6 ... 36312 36313 36314 36315 36316 36317
<ipython-input-104-8ecef081ce43>:217: DeprecationWarning: The background_patch property is deprecated. Use GeoAxes.patch instead. ax1.background_patch.set_fill(False) <ipython-input-104-8ecef081ce43>:223: DeprecationWarning: The outline_patch property is deprecated. Use GeoAxes.spines['geo'] or the default Axes properties instead. a.outline_patch.set_linewidth(0.)
<ipython-input-85-b3fd16095874>:18: DeprecationWarning: The background_patch property is deprecated. Use GeoAxes.patch instead. ax.background_patch.set_fill(False) <ipython-input-85-b3fd16095874>:23: DeprecationWarning: The outline_patch property is deprecated. Use GeoAxes.spines['geo'] or the default Axes properties instead. ax.outline_patch.set_linewidth(0.)
<cartopy.mpl.geoaxes.GeoAxesSubplot at 0x7ff1c6c414f0>